N.B. GAMLSS models cannot be fitted using bam or bamV. Big Data methods for GAMLSS models in mgcv are still under development.
We consider a simple data set containing data on grip strength (HG) vs age:
library(gamlss.data)
data(grip)
plot(grip ~ age, data = grip)
We fit a standard Gaussian GAM, and we convert it to mgcViz
library(mgcViz)
fit1 <- gamV(grip ~ s(age),
data = grip,
aViz = list(nsim = 50))
fit1 <- getViz(fit1, nsim = 50)
Then we check the residuals mean of a grid of values along age:
tmp <- check(fit1)
##
## Method: REML Optimizer: outer newton
## full convergence after 5 iterations.
## Gradient range [-0.001415483,0.001168618]
## (score 12186.55 & scale 37.74976).
## Hessian positive definite, eigenvalue range [1.553286,1882.003].
## Model rank = 10 / 10
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(age) 9.00 4.35 1 0.46
check1D(fit1, "age") + l_gridCheck1D()
Now we check the residuals standard deviation along
age
check1D(fit1, "age") + l_gridCheck1D( sd ) # <- notice that we are looking at stan dev
The variance is clearly increasing with
age. We take this into account using a gaulss model
fit2 <- gamV(list(grip ~ s(age), ~ s(age)),
data = grip,
family = gaulss,
aViz = list(nsim = 50))
and we repeat the check
check1D(fit2, "age") + l_gridCheck1D( sd )
The residual check now looks ok. These are the estimated mean and log stand. dev. effects
print(plot(fit2), pages = 1)
We looks at a residual QQ-plot
qq(fit2, CI = "normal")
The residuals look skewed to the right. We can also know check whether the residual skewness (asymmetry) changes with age
library(e1071)
check1D(fit2, "age") + l_gridCheck1D( skewness )
It seems that the skewness decreases with age. It would have been difficult to see this directly from the data
plot(grip ~ age, data = grip)
We allow for skewness by adopting a
shash model
library(mgcFam)
fit3 <- gamV(list(grip ~ s(age), # location
~ s(age), # scale (variance)
~ s(age), # skewness (asymmetry)
~ 1), # kurtosis (tail behaviour)
data = grip,
family = shash,
aViz = list(nsim = 50))
check1D(fit3, "age") + l_gridCheck1D( skewness )
The skewness seems to be better modelled now, and we achieve lower AIC:
AIC(fit1, fit2, fit3)
## df AIC
## fit1 6.603372 24369.57
## fit2 13.896755 24049.30
## fit3 15.237918 23873.78
The next step is to have a look at residual kurtosis (tail behaviour), this could be done by:
check1D(fit3, "age") + l_gridCheck1D( kurtosis )
but we leave it for next time.
–> –> –> –> –> –>
Let us consider the motorcycle data set:
library(MASS)
data(mcycle)
plot(mcycle)
A good Gaussian GAMLSS model for this data is
library(mgcViz)
fitG <- gamV(list(accel ~ s(times, k=20, bs="ad"),
~ s(times)),
data=mcycle,
family=gaulss())
print(plot(fitG), pages = 1)
We can obtain the quantiles by inverting the estimated Gaussian CDF
qu <- c(0.1, 0.25, 0.5, 0.75, 0.9)
q_hat <- lapply(qu,
function(.q)
qnorm(.q, mean = fitG$fitted.values[ , 1],
sd = 1 / fitG$fitted.values[ , 2]))
and we can then plot them on top of the data
plot(mcycle, ylim = c(-150, 75))
for(ii in 1:5) lines(mcycle$times, q_hat[[ii]], col = 2)
Now we fit a quantile GAM for quantile 0.9
fit09 <- qgam(accel ~ s(times, k=20, bs="ad"),
data = mcycle,
qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.9....................done
We now plot the quantile with confidence intervals
pred <- predict(fit09, se = TRUE)
plot(mcycle, ylim = c(-140, 85))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)
There are two problems with this fit:
times < 15.We address these issues by modelling the learning rate as well
fit09_2 <- qgam(list(accel ~ s(times, k=20, bs="ad"),
~ s(times)),
data = mcycle,
qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.9..........done
Again predict and plot
pred <- predict(fit09_2, se = TRUE)
plot(mcycle, ylim = c(-140, 90))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)
Looks much better! Recall that
class(fit09_2)
## [1] "qgam" "gam" "glm" "lm"
hence we can do, for instance
summary(fit09_2)
##
## Family: elf
## Link function: identity
##
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.182 1.755 1.244 0.214
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(times) 11.05 12.66 393 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.676 Deviance explained = 97.3%
## -REML = 578.21 Scale est. = 1 n = 133
We can use mgcViz plots by converting as usual
fit09_2 <- getViz(fit09_2)
or using shortcut
fit09_2 <- qgamV(accel ~ ...
We fit multiple quantiles using the mqgam function
fitMQ <- mqgam(list(accel ~ s(times, k=20, bs="ad"),
~ s(times)),
data = mcycle,
qu = c(0.1, 0.25, 0.5, 0.75, 0.9))
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.5..............done
## qu = 0.25..........................done
## qu = 0.75.........done
## qu = 0.1..............done
## qu = 0.9..........done
This should be manipulated using the qdo function
qdo(fitMQ, 0.1, summary)
##
## Family: elf
## Link function: identity
##
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -54.535 2.367 -23.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(times) 8.323 9.153 932.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.649 Deviance explained = 89.9%
## -REML = 593.53 Scale est. = 1 n = 133
To plot one of the fitted quantile models we do
qdo(fitMQ, 0.1, plot)
To plot effect of several quantile at once we use
mgcViz
fitMQ_viz <- getViz(fitMQ)
plot(fitMQ_viz)
We can plot the fitted quantile on the data
plot(mcycle, ylim = c(-150, 90))
for(ii in 1:5) lines(mcycle$times, predict(fitMQ_viz[[ii]]), col = 2)
NB output of getViz is simply a list of gam models, we don’t need to use qdo, instead
predict(fitMQ)
# Error in UseMethod("predict") :
# no applicable method for 'predict' applied to an object of class "mqgam"
But there is a memory price to pay (small for this small data set)
object.size(fitMQ) / 1e6
object.size(fitMQ_viz) / 1e6
To recap we can do
fit <- qgam(...) and then fit <- getViz(fit), or fit <- qgamV(...) directlyfit <- mqgam(...) and we use qdo(fit, 0.1, plot)fit <- mqgam(...), then fit <- getViz(fit) and finally plot(fit[[1]])fit <- mqgamV(...) andHere we are going to model weekly rainfall (mm/week) from over 600 weather station in the Parana state of Brazil. The data comes comes from the INLA R package. The meaning of most variables should be evident.
library(testGam)
data("parana")
plot(parana[parana$TIME==1, ]$LO, parana[parana$TIME==1, ]$LA, xlab = "LO", ylab = "LA")
We fit a quantile GAM for \(\tau = 0.8\) with an isotropic effect for longitude and latitude, a cyclic effect for the time of the year and smooth effects for distance from the sea and elevation:
library(mgcViz)
fit <- qgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
data = parana,
qu = 0.8)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.8............done
print(plot(fit), pages = 1)
Hence we can do model checking in mgcViz. However standard tools are inadequate
qq(fit, CI = "normal")
But we can verify number of observations falling below the fit
mean( (parana$PREC - fit$fitted.values) < 0 )
## [1] 0.803384
and mgcViz provides a specific layer for doing this along one covariate
check1D(fit, x = "EL") + l_gridQCheck1D()
or two covariates
check2D(fit, x1 = "LO", x2 = "LA") + l_gridQCheck2D()
We can of course fit this model to several quantiles:
fitM <- mqgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
data = parana,
qu = seq(0.1, 0.9, length.out = 5))
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.5................done
## qu = 0.3...................done
## qu = 0.7...............done
## qu = 0.1......................done
## qu = 0.9............done
plot(fitM, select = 1)
print(plot(fitM, select = 2:4), pages = 1)
Notice that the spatial effect seems much stronger for high quantiles than for the low ones. The same is true for distance from the sea and seasonality (
TIME), while the effect of elevation is not significant from qu = 0.9. We can visualize the spatial effect in 3D as follows:
plotRGL(sm(fitM[[5]], 1))
We might also wonder whether the spatial effect changes with time. To verify this here we construct a tensor product smooth between the 2D thin-plate-spline spatial effect and the cyclical effect of time. We simplify the model by removing the effect of seaDist.
fit4 <- qgamV(PREC ~ te(LO, LA, TIME, d = c(2, 1), k = c(20, 10), bs = c("tp", "cc")) + s(EL),
data = parana,
qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.9........done
plotSlice(sm(fit4, 1),
fix = list("TIME" = round(seq(1, 53, length.out = 6)))) + l_fitRaster()
You can plot any slice in 3D by doing:
plotRGL(sm(fit4, 1), fix = c("TIME" = 11))